2,859 research outputs found

    Functional CLT for sample covariance matrices

    Full text link
    Using Bernstein polynomial approximations, we prove the central limit theorem for linear spectral statistics of sample covariance matrices, indexed by a set of functions with continuous fourth order derivatives on an open interval including [(1y)2,(1+y)2][(1-\sqrt{y})^2,(1+\sqrt{y})^2], the support of the Mar\u{c}enko--Pastur law. We also derive the explicit expressions for asymptotic mean and covariance functions.Comment: Published in at http://dx.doi.org/10.3150/10-BEJ250 the Bernoulli (http://isi.cbs.nl/bernoulli/) by the International Statistical Institute/Bernoulli Society (http://isi.cbs.nl/BS/bshome.htm

    An experimental study of learned cardinality estimation

    Get PDF
    Cardinality estimation is a fundamental but long unresolved problem in query optimization. Recently, multiple papers from different research groups consistently report that learned models have the potential to replace existing cardinality estimators. In this thesis, we ask a forward-thinking question: Are we ready to deploy these learned cardinality models in production? Our study consists of three main parts. Firstly, we focus on the static environment (i.e., no data updates) and compare five new learned methods with eight traditional methods on four real-world datasets under a unified workload setting. The results show that learned models are indeed more accurate than traditional methods, but they often suffer from high training and inference costs. Secondly, we explore whether these learned models are ready for dynamic environments (i.e., frequent data updates). We find that they can- not catch up with fast data updates and return large errors for different reasons. For less frequent updates, they can perform better but there is no clear winner among themselves. Thirdly, we take a deeper look into learned models and explore when they may go wrong. Our results show that the performance of learned methods can be greatly affected by the changes in correlation, skewness, or domain size. More importantly, their behaviors are much harder to interpret and often unpredictable. Based on these findings, we identify two promising research directions (control the cost of learned models and make learned models trustworthy) and suggest a number of research opportunities. We hope that our study can guide researchers and practitioners to work together to eventually push learned cardinality estimators into real database systems

    Diagnosis, Rupture Risk Evaluation and Therapeutic Intervention of Abdominal Aortic Aneurysms Using Targeted Nanoparticles

    Get PDF
    Abdominal aortic aneurysm (AAA) disease causes dilation of the aorta that can lead to aortic rupture and death if not treated early. It is the 14th leading cause of death in the U.S. and is cited as the 10th leading cause of death in men over age 55, affecting thousands of patients and their families. To date, AAA patients have minimal access to safe and efficient imaging modalities for diagnosis as well as pharmacotherapies. AAA is usually detected and monitored with ultrasonography or contrast-enhanced computed tomography (C.T.), which doesn’t provide biomechanical information of the AAAs that are essential for predicting rupture risks. Furthermore, unfortunately, there is no currently known pharmaceutical treatment to cure the AAAs. Key pathological processes occurring within AAAs include inflammation, vascular smooth muscle cell apoptosis, and extracellular matrix (ECM) degradation. The deterioration of the elastic lamina in the aneurysmal wall is a consistent feature of AAAs and the fact that the adult elastic lamina does not remodel in aneurysm progression, making it an ideal target for delivering contrast agents and treatments. In this research, we have delivered gold nanoparticles (AuNPs), a commonly used C.T. contrast agent, and pentagalloyl glucose (PGG) loaded nanoparticles to the AAAs in an angiotensin II (AngII) infusion induced mouse model by conjugating the nanoparticles with antibodies that target degraded elastin. Here, owing to their degraded elastin targeting ability, we have observed a positive correlation between the quantities of the locally accumulated AuNPs in the aneurysmal tissue in C.T. scans and the elastin damage levels of the AAAs. Furthermore, the AuNPs accumulations were found negatively correlated to the mechanical properties of the AAAs, which makes AuNPs a potential non-invasive surrogate marker of AAA rupture risk. Moreover, we have shown that targeted delivery of PGG could reverse the aortic dilation, ameliorate the inflammation, restore the elastin as well as the AAA mechanical properties of the aneurysmal tissue. Therefore, PGG loaded nanoparticles can be an effective treatment option for early to middle stage aneurysms to prevent disease progression

    Studying Both Direct and Indirect Effects in Predator-Prey Interaction

    Get PDF
    Studying and modelling the interaction between predators and prey have been one of the central topics in ecology and evolutionary biology. In this thesis, we study two different aspects of predator-prey interaction: direct effect and indirect effect. Firstly, we study the direct predation between predators and prey in a patchy landscape. Secondly, we study indirect effects between predators and prey. Thirdly, we extend our previous model by incorporating a stage-structure into prey. Finally, we further extend our previous model by incorporating spatial structures into modelling

    Comparison of Statistical Testing and Predictive Analysis Methods for Feature Selection in Zero-inflated Microbiome Data

    Get PDF
    Background: Recent advances in next-generation sequencing (NGS) technology enable researchers to collect a large volume of microbiome data. Microbiome data consist of operational taxonomic unit (OTU) count data characterized by zero-inflation, over-dispersion, and grouping structure among the sample. Currently, statistical testing methods based on generalized linear mixed effect models (GLMM) are commonly performed to identify OTUs that are associated with a phenotype such as human diseases or plant traits. There are a number of limitations for statistical testing methods including these two: (1) the validity of p-value/q-value depends sensitively on the correctness of models, and (2) the statistical significance does not necessarily imply predictivity. Statistic testing methods depend on model correctness and attempt to select ”marginally relevant” features, not the most predictive ones. Predictive analysis using methods such as LASSO is an alternative approach for feature selection. To the best of our knowledge, this approach has not been used widely for analyzing microbiome data. Methodology: We use four synthetic datasets simulated from zero-inflated negative binomial distribution and a real human gut microbiome data to compare the feature selection performance of LASSO with the likelihood ratio test methods applied to GLMMs. We also investigate the performance of cross-validation in estimating the out-of-sample predictivity of selected features in zero-inflated data. Results: Our studies with synthetic datasets show that the feature selection performance of LASSO is remarkably excellent in zero-inflated data and is comparable with the likelihood ratio test applied to the true data generating model. The feature selection performance of LASSO is better when the distributions of counts are more differentiated by the phenotype, which is a categorical variable in our synthetic datasets. In addition, we performed LOOCV on the train set and out-of-sample prediction on the test set. The performance of the cross-validatory (CV) predictive measures are very close to the out-of-sample predictivity measures. This indicates that LOOCV predictive metrics provide honest measures of the predictivity of the features selected by LASSO. Therefore, the CV predictive measures are good guidance for choosing cutoffs (shrinkage parameter λ\lambda) in selecting features with LASSO. By contrast, when wrong models are fitted to a dataset, the differences between the q-values and the actual false discovery rates are huge; hence, their q-values are tremendously misleading for selecting features. Our comparison of LASSO and statistical testing methods (likelihood ratio test in our analysis) in the real dataset shows that small q-values do not necessarily imply high predictivity of the selected OTUs. However, the researchers often use q-values to find the predictors. That is why we need to look at q-values carefully. Conclusions: Statistical testing methods perform greatly in zero-inflated datasets on both synthetic and real data. However, a serious model checking should be conducted before we use q-values to choose features. Predictive analysis with LASSO is recommended to supplement q-values for selecting features and for measuring the predictivity of selected features

    Design and testing of a prototype solar sanitizer

    Get PDF
    According to the World Health Organization (WHO), many people die of preventable diseases like diarrhea, cholera, pneumonia, and hepatitis A each year. A lack of appropriate water treatment infrastructure, improper waste disposal and inadequate treatment of human waste are common causes of water contamination. In addition to their severe effects on local populations, similar challenges for preventing diarrheal diseases confronted the United States Army during recent deployments in Iraq and Afghanistan. The common ways to treat human waste for the United States Army include burning, chemical disinfection or burying. Burning and chemical disinfection are harmful to the environment; moreover, both are expensive in terms of the fuel and chemicals that are used during waste treatment. Burying waste under the earth is cheaper but, it could possibly stimulate the growth and spread of bacteria in water sources around rivers and wells if done improperly. Therefore, there is a significant need for an effective low-cost, low-technology solution that prevents the spread of the above-mentioned diseases for civilians in developing countries and the United States Army. We have developed and tested a device that meets these needs. The device, which we call the Solar Sanitizer, takes advantage of infrared sunlight to kill the bacteria in human waste. This thesis describes the motivation, design, prototype development, and preliminary testing of the Solar Sanitizer
    corecore